Cooperative Crawling

نویسنده

Marina Buzzi

چکیده

Web crawler design presents many different challenges: architecture, strategies, performance and more. One of the most important research topics concerns improving the selection of “interesting” web pages (for the user), according to importance metrics. Another relevant point is content freshness, i.e. maintaining freshness and consistency of temporary stored copies. For this, the crawler periodically repeats its activity going over stored contents (re-crawling process). In this paper, we propose a scheme to permit a crawler to acquire information about the global state of a website before the crawling process takes place. This scheme requires web server cooperation in order to collect and publish information on its content, useful for enabling a crawler to tune its visit strategy. If this information is unavailable or not updated the crawler still acts in the usual manner. In this sense the proposed scheme is not invasive and is independent from any crawling strategy and architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Distributed, Interleaved, Parallel and Cooperative Search in Constraint Satisfaction Networks

In this work, we extend the efficiency of distributed search in constraint satisfaction networks. Our method adds interleaving and parallelism into distributed backtrack search. Moreover, it has a filtering capacity that makes it open to cooperative work. Experimentations show that 1) the shape of phase transition with random problem can be characterized, 2) important speed-up can be achieved w...

متن کامل

Prioritize the ordering of URL queue in Focused crawler

The enormous growth of the World Wide Web in recent years has made it necessary to perform resource discovery efficiently. For a crawler it is not an simple task to download the domain specific web pages. This unfocused approach often shows undesired results. Therefore, several new ideas have been proposed, among them a key technique is focused crawling which is able to crawl particular topical...

متن کامل

Information Sharing among Heterogeneous Reusable Agents in Cooperative Distributed Search

Information sharing among heterogeneou~ ~geusable agents in cooperative distributed search systems can greatly affect the quality of solutions and the runtime efficiency of the system. In this paper, we first give a formal description of shareable information in systems where agents have private knowledge and databases and where agents are specifically intended to be reusable. We then present e...

متن کامل

Parallel Web Spiders for Cooperative Information Gathering

Web spider is a widely used approach to obtain information for search engines. As the size of the Web grows, it becomes a natural choice to parallelize the spider’s crawling process. This paper presents a parallel web spider model based on multi-agent system for cooperative information gathering. It uses the dynamic assignment mechanism to wipe off redundant web pages caused by parallelization....

متن کامل

Subjective partial cooperation in multi-agent local search

A partial cooperative model that was recently proposed offers a balance between the two extreme scenarios commonly assumed in multi-agent systems, either completely competitive or fully cooperative agents. Partial cooperative agents act cooperatively in a distributed search process, as long as the outcome satisfies some threshold on their personal utility, otherwise, they act selfishly. While p...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2003

Cooperative Crawling

نویسنده

چکیده

منابع مشابه

Distributed, Interleaved, Parallel and Cooperative Search in Constraint Satisfaction Networks

Prioritize the ordering of URL queue in Focused crawler

Information Sharing among Heterogeneous Reusable Agents in Cooperative Distributed Search

Parallel Web Spiders for Cooperative Information Gathering

Subjective partial cooperation in multi-agent local search

عنوان ژورنال:

اشتراک گذاری